Search CORE

3 research outputs found

Recommended from our members

Interpretable Deep Learning: Beyond Feature-Importance with Concept-based Explanations

Author: Dimanov Botty
Publication venue: University of Cambridge
Publication date: 30/12/2020
Field of study

Deep Neural Network (DNN) models are challenging to interpret because of their highly complex and non-linear nature. This lack of interpretability (1) inhibits adoption within safety critical applications, (2) makes it challenging to debug existing models, and (3) prevents us from extracting valuable knowledge. Explainable AI (XAI) research aims to increase the transparency of DNN model behaviour to improve interpretability. Feature importance explanations are the most popular interpretability approaches. They show the importance of each input feature (e.g., pixel, patch, word vector) to the model’s prediction. However, we hypothesise that feature importance explanations have two main shortcomings concerning their inability to describe the complexity of a DNN behaviour with sufficient (1) fidelity and (2) richness. Fidelity and richness are essential because different tasks, users, and data types require specific levels of trust and understanding. The goal of this thesis is to showcase the shortcomings of feature importance explanations and to develop explanation techniques that describe the DNN behaviour with greater richness. We design an adversarial explanation attack to highlight the infidelity and inadequacy of feature importance explanations. Our attack modifies the parameters of a pre-trained model. It uses fairness as a proxy measure for the fidelity of an explanation method to demonstrate that the apparent importance of a feature does not reveal anything reliable about the fairness of a model. Hence, regulators or auditors should not rely on feature importance explanations to measure or enforce standards of fairness. As one solution, we formulate five different levels of the semantic richness of explanations to evaluate explanations and propose two function decomposition frameworks (DGINN and CME) to extract explanations from DNNs at a semantically higher level than feature importance explanations. Concept-based approaches provide explanations in terms of atomic human-understandable units (e.g., wheel or door) rather than individual raw features (e.g., pixels or characters). Our function decomposition frameworks can extract specific class representations from 5% of the network parameters and concept representations with an average-per-concept F1 score of 86%. Finally, the CME framework makes it possible to compare concept-based explanations, contributing to the scientific rigour of evaluating interpretability methods.The author would like to appreciate the generous sponsorship of the Engineering and Physical Sciences Research Council (EPSRC), The Department of Computer Science and Technology at the University of Cambridge, and Tenyks, Inc

Apollo (Cambridge)

Recommended from our members

You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods.

Author: Bhatt Umang
Dimanov Botty
Jamnik Mateja
Weller Adrian
Publication venue: Frontiers in Artificial Intelligence and Applications: ECAI 2020
Publication date: 01/09/2020
Field of study

Transparency of algorithmic systems is an important area of research, which has been discussed as a way for end-users and regulators to develop appropriate trust in machine learning models. One popular approach, LIME [23], even suggests that model expla- nations can answer the question “Why should I trust you?”. Here we show a straightforward method for modifying a pre-trained model to manipulate the output of many popular feature importance explana- tion methods with little change in accuracy, thus demonstrating the danger of trusting such explanation methods. We show how this ex- planation attack can mask a model’s discriminatory use of a sensitive feature, raising strong concerns about using such explanation meth- ods to check fairness of a model

Apollo (Cambridge)

Recommended from our members

Is Disentanglement all you need? Comparing Concept-based & Disentanglement Approaches

Author: Dimanov Botty
Jamnik Mateja
Kazhdan Dmitry
Liò Pietro
Terre Helena Andres
Weller Adrian
Publication venue: CoRR
Publication date: 13/03/2023
Field of study

Concept-based explanations have emerged as a popular way of extracting human-interpretable representations from deep discriminative models. At the same time, the disentanglement learning literature has focused on extracting similar representations in an unsupervised or weakly-supervised way, using deep generative models. Despite the overlapping goals and potential synergies, to our knowledge, there has not yet been a systematic comparison of the limitations and trade-offs between concept-based explanations and disentanglement approaches. In this paper, we give an overview of these fields, comparing and contrasting their properties and behaviours on a diverse set of tasks, and highlighting their potential strengths and limitations. In particular, we demonstrate that state-of-the-art approaches from both classes can be data inefficient, sensitive to the specific nature of the classification/regression task, or sensitive to the employed concept representation

Apollo (Cambridge)